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Abstract 

We consider the problem of computing the q ^ p norm of a matrix A, which is defined for 
p,q> 1, as 

x^O \\x\\q 

This is in general a non-convex optimization problem, and is a natural generalization of 
the well-studied question of computing singular values (this corresponds to p = q = 2). Dif- 
ferent settings of parameters give rise to a variety of known interesting problems (such as the 
Grothendieck problem when p ~ 1 and q ~ oo). However, very little is understood about the 
approximability of the problem for different values oi p,q. 

Our first result is an efficient algorithm for computing the q ^ p norm of matrices with 
non- negative entries, when q > p > 1- The algorithm we analyze is based on a natural fixed 
point iteration, which can be seen as an analog of power iteration for computing eigenvalues. 

We then present an application of our techniques to the problem of constructing a scheme 
for oblivious routing in the ip norm. This makes constructive a recent existential result of 
Englert and Racke }ER09j on O(logri) competitive oblivious routing schemes (which they make 
constructive only for p = 2). 

On the other hand, when we do not have any restrictions on the entries (such as non- 
negativity), we prove that the problem is NP-hard to approximate to any constant factor, for 
2 < p < q and p < q < 2 (these are precisely the ranges of p, q with p < q where constant 
factor approximations are not known). In this range, our techniques also show that if NP ^ 
DTIME(nP°^y^°s(")), the problem cannot be approximated to a factor 2('°s")'"' , for any constant 
e > 0. 
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1 Introduction 



We study the problem of computmg norms of matrices. The Hq to Ip norm of a matrix A £ j^'^x" 
is defined to be 

Pllgn^p = max nrr^' where ||x||p = {\xi\p H h Ixnl^Y^^ 

Throughout, we think of p,q > 1. If we think of the matrix as an operator from with the Iq 
norm to the space M™ with ip norm, the norm ||j4||gH^p measures the 'maximum stretch' of a unit 
vector. 

Computing the q i— )■ p-norm of a matrix is a natural optimization problem. For instance, it 
can be seen as a natural generalization of the extensively studied problem of computing the largest 
singular value of a matrix [HJ85j . This corresponds to the case p = q = 2. When p = 1 and q = oo, 
it turns out to be the well-studied Grothendieck problem |Gro531 IAN04j , which is defined as 

a;,,j/ie{-i,i} ^ 

Thus for different settings of the parameters, the problem seems to have very different flavors. 

We study the question of approximating ||^||qH-i.p for different ranges of the parameters p,q. 
The case p = q is referred to as the matrix p-norm (denoted by and has been considered 

in the scientific computing community. For instance, it is known to have connections with matrix 



condition number estimates (see Hig92 for other applications). Computing ||^|lqH->.p has also been 



studied because of its connections to robust optimization |Ste05] . Another special case which has 
been studied |Boy74 ISteOSj is one where the entries of the matrix A are restricted to be non- 
negative. Such instances come up in graph theoretic problems, like in the ip oblivious routing 
question of |ERn9j . 

Note that computing the matrix q^ p norm is a problem of maximizing a convex function over 
a convex domain. While a convex function can be minimized efficiently over convex domains using 
gradient descent based algorithms, it is in general hard to maximize them. Thus it is interesting that 
our algorithm can efficiently compute the norm for non- negative matrices for a range of parameters. 



Known algorithms. Very little is known about approximating in general. For comput- 

ing p-norms (i.e., q = p), polynomial time algorithms for arbitrary A are known to exist only for 
p = 1,2, and oo. For the general problem, for p < 2, q > 2, Nesterov |Nes98 j shows that the problem 
can be approximated to a constant factor (which can be shown to be < 2.3), using a semidefinite 
programming relaxation. When the matrix has only non-negative entries, this relaxation can be 
shown to be exact |Ste05j . 

For other ranges of p, q, the best known bounds are polynomial factor approximations, obtained 
by 'interpolating'. For instance, for computing ||A||p,_s.p, computing the vectors that maximize 
the norm for p = l,2,oo, and picking the best of them gives an 0(n^/^) approximation for all p 
(see |Hig92| ). For the general problem of computing ||j4||qH-i-p, Steinberg |Ste05j gives an algorithm 
with an improved guarantee of 0(n^^/^^^), by taking into account the approximation algorithms of 
Nesterov for certain ranges. 

These algorithms use Holder's inequality, and a fact which follows from the duality of ip spaces 
(this sometimes allows one to 'move' from one range of parameters to another): 

II 4 II _ II aTu 

1 1 ^ 1 1 (jri-S-p — 11^ llp'M-g'i 

where is the transpose of the matrix A, and p' and q' are the 'duals' of p, q respectively (i.e. 
1/p + 1/p' = 1). See Appendix IA.2I for a proof. 
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The hardness front. The problem is known to be NP-hard in the range ^ > p > 1 jSte05j . Very 
recently in independent work, [HQ09j show that it is NP-hard to compute the p-norm to arbitrary 
relative precision when p {1, 2, oo} (i.e., there cannot be a (1 + 6) approximation algorithm with 
run time poly(n, m, 1/6)). 



1.1 Our Results 

Non-negative matrices. We first consider the case of matrices A with non-negative entries. 
Here we prove that 1 < p < q, then can be computed in polynomial time. More precisely 

we give an algorithm which gives a {1 + 6) approximation in time polynomial in n,m, and (l/S). 

Thus in particular, we give the first poly time guarantee (to the best of our knowledge) for 
computing the matrix p-norm for non-negative matrices. We give an analysis of a power iteration 
type algorithm for computing p-norms proposed by Boyd |Boy74| . The algorithm performs a fixed 
point computation, which turns out to mimic power iteration for eigenvalue computations. 

Heuristic approaches to many optimization problems involve finding solutions via fixed point 
computations. Our analysis proves polynomial convergence time for one such natural fixed point 
algorithm. These techniques could potentially be useful in other similar settings. We believe that 
this algorithm could be useful as an optimization tool for other problems with objectives that 
involve p- norms (or as a natural extension of eigenvalue computations) . We now mention one such 
application, to oblivious routing in the ip norm. 



Application to Oblivious Routing. In the oblivious routing problem, we are given a graph G, 
and we need to output a 'routing scheme', namely a unit flow between every pair of vertices. Now 
given a set of demands (for a multicommodity flow), we can route them according to this scheme 
(by scaling the flows we output according to the demand in a natural way), and the total flow on 
each edge is obtained. The aim is to compete (in terms of, for instance, max congestion) with the 
best multicommodity flow 'in hindsight' (knowing the set of demands). 

For max-congestion (maximum to tal fl ow on an edge - which is the ioo norm of the vector of 



flows on edges), a beautiful result of [ROSj gives an O(logn) competitive routing scheme. Englert 



and Racke |ER09j recently showed that there exists an oblivious routing scheme which attains a 
competitive ratio of O(logn) when the objective function is the ip-norm of the flow vector (|ii^| 
dimensional vector). However, they can efficiently compute this oblivious routing scheme only for 
p = 2. 

From the analysis of our algorithm, we can prove that for matrices with strictly positive entries 
there is a unique optimum. Using this and a related idea (Section [4]), we can make the result 
of |ER09) constructive. Here matrix p-norm computation is used as a 'separation oracle' in a 
multiplicative weights style update, and this gives an 0(log n)-competitive oblivious routing scheme 
for all ^p-norms {p > 1). 

Hardness of approximation. For general matrices (with negative entries allowed), we show the 
inapproximability of almost polynomial factor for computing the q p norm of general matrices 
when q > p and both p, q are > 2. By duality, this implies the same hardness when both p, q are 
< 2 and > p0 

More precisely, for these ranges, we prove that computing ||j4|||jH-i.p upto any constant factor is 
NP-hard. Under the stronger assumption that NP ^ DTIME(2P°'yi°g(")), we prove that the problem 
is hard to approximate to a factor of 17(2^^°^"^ ^), for any constant £ > 0. 



^When p < 2 and q > 2, Nesterov's algorithm gives a constant factor approximation. 
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Techniques. We first consider p ^ p norm approximation, for whicli we show constant factor 
liardness by a gadget reduction from the gap version of MaxCut. Then we show that the p ^ p 
norm multiplies upon tensoring, and thus we get the desired hardness amplification. While the 
proof of the small constant hardness carries over to the q p norm case with q > p > 2, m general 
these norms do not multiply under tensoring. We handle this by giving a way of starting with a 
hard instance of p i— t- p norm computation (with additional structure, as will be important), and 
convert it to one of g i— )• p norm computation. 

We find the hardness results for computing the q p norm interesting because the bounds 
are very similar to hardness of combinatorial problems like label cover, and it applies to a natural 
numeric optimization problem. 

The question of computing ||A||ooH-s>p has a simple alternate formulation (Definition 16. 9p : given 
vectors ai, a2, . . . , an, find a {=b} combination of the vectors so as to maximize the length (in ip 
norm) of the resultant. The previous hardness result also extends to this case. 

Comparison with previous work. For clarity, let us now tabulate our algorithmic and hardness 
results in Tables 11.11 and show how they compare with known results for different values of the 
parameters p, q. Each row and column in the table gives three things: the best known algorithm 
in general, the best known approximation algorithm when all entries of A are non-negative, and 
the best known hardness for this range of p, q. An entry saying "NP-hard" means that only exact 
polynomial time algorithms are ruled out. 

Discussion and open questions. All our algorithms and hardness results apply to the case 
p < q, but we do not know either of these (even for non- negative matrices) for p > q (which is 
rather surprising!). For algorithmic results (for positive matrices, say) the fact that we can optimize 
seems closely tied to the fact that the set {x : \\x\\p/\\x\\q > r} is convex for any 
T > and p < q. However, we are not aware of any formal connection. Besides, when p > q, even 
for non-negative matrices there could be multiple optima (we prove uniqueness of optimum when 
p < q). 

On the hardness front, the q < p case seems more related to questions like the Densest k- 
subgraph problem (informally, when the matrix is positive and p < q, ii there is a 'dense enough' 
submatrix, the optimum vector would have most of its support corresponding to this). Thus the 
difficulties in proving hardness for the norm question may be related to proving hardness for densest 
subgraph. 

Hypercontractive norms (corresponding to q < p) have been well-studied |KKL88j . and have 
also found prior use in inapproximability results for problems like maxcut. Also, known integrality 
gap instances for unique games |KV05j are graphs that are hypercontractive. We believe that 
computability of hypercontractive norms of a matrix could reveal insights into the approximability 
of problems like small set expansion [RS10| and the planted dense fc-subgraph problem [BCC"'"10] . 

1.2 Related work. 

A question that is very related to matrix norm computation is the Lp Grothendieck problem, which 
has been studied earlier by |KNS08j . The probl em is to compute 

max x*i?x 

l|x||p<l 

The question of computing ||A||pH_).2 is a special case of the Lp Grothendieck problem (where B ^0). 
|KNS08| give an optimal (assuming UGC) 0{p) approximation algorithm. For B being p.s.d., 
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Table 1: Previous work 







l<q<2 


q = 2 


q>2 


Best Approximation 
Hardness 
Non-negative 
matrices 


l<p<2 


poly(n) 
p < q: NP-hard 


0(1) [Nes98] 

NP-hard 
Exact |Ste05j 


0(l)[Ncs98j 

NP-hard 
Exact jSteOSj 


Best Approximation 
Hardness 
Non-negative 
matrices 


p = 2 


poly(n) 


Exact 
Exact 


0(l) |Nes98j 

NP-hard 
Exact jSteOSj 


Best Approximation 
Hardness 
Non-negative 
matrices 


p>2 


poly(n) 


poly(n) 


poly(n) 
p < q: NP-hard 



Table 2: Our results. We give better algorithms for non-negative matrices and obtain almost- 
polynomial hardness results when q>p. 







l<q<2 


q = 2 


q>2 


Hardness 
Non-negative matrices 


l<p<2 


p<q: 2('°S")'~'-hard 
p < q: Exact 


Exact 


Exact 


Hardness 
Non-negative matrices 


p = 2 




Exact 


Exact 


Hardness 
Non-negative matrices 


p>2 






p<q: 2(l°S")'~'-hard 
p < q: Exact 



constant factor approximation algorithms are known, due to |Nes98j . Computing |jA||ooH->-2 reduces 
to maximizing a quadratic form over ±1 domain for p.s.d matrices |CW04[ lNes98| . 

Recently, jDVTV09] studies an optimization problem which has an ip norm objective - they 
wish to find the best A;-dimensional subspace approximation to a set of points, where one wishes 
to minimize the ip distances to the subspace (there are other problems in approximation theory 
which are of similar nature). When k = n — 1 this can be shown to reduce to the Lp Grothendieck 
problem for the matrix A~^. 

1.3 Paper Outline 

We start by presenting the algorithm for positive matrices (Section I3.ip . and prove poly time 
convergence (Section I3.2p . Some additional properties of the optimization problem are discussed 
in Section U (such as unique maximum, concavity around optimum), which will be useful for 
an oblivious routing application. This will be presented in Section O Finally in Section [H we 
study the inapproximability of the problem: we first show a constant factor hardness for ||j4|[pH-i>p 
(Section 16. Ij) . and show how to amplify it (Section 16. 2p . Then we use this to show hardness for 
||A||q^j, in section [Ol 
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2 Notation and Simplifications 



We write M+ for the set of non-negative reals. For a matrix A, we let Ai denote the ith row of A. 
Also ttij denotes the element in the ith row and jth column. Similarly for a vector x, we denote 
the ith co-ordinate by x,. We say that a vector x is positive if the entries Xi are all > 0. Finally, 
for two vectors x,y, we write x oc y to mean that x is proportional to y, i.e., x = Ay for some A 
(in all places we use it, A will be > 0). 

For our algorithmic results, it will be much more convenient to work with matrices where we 
restrict the entries to be in [1/N, 1], for some parameter A'^ (zero entries can cause minor problems). 
If we are interested in a (1 -|- 5) approximation, we can first scale A such that the largest entry is 
1, pick N ~ (m -|- rif' /5, where m, n are the dimensions of the matrix, and work with the matrix 
A+^J (here J is the mxn matrix of ones). The justification for this can be found in Appendix lA.3l 
We will refer to such A as a positive matrix. 



3 An Iterative Algorithm 

In this section, we consider positive matrices A, and prove that if 1 < p < g, we can efficiently 
compute Suppose A is of dimensions n x n, and define / : M" iH^ R by 

We present an algorithm due to Boyd |Boy74| , and prove that it converges quickly to the 
optimum vector. The idea is to consider V/, and rewrite the equation V/ = as a fixed point 
equation (i.e., as S'x = x, for an appropriate operator S). The iterative algorithm then starts with 
some vector x, and applies S repeatedly. Note that in the case p = 2, this mimics the familiar power 
iteration (in this case 5 will turn out to be multiplication by the matrix A (up to normalization)). 

3.1 Algorithm description 

Let us start by looking at V/. 

df 11*^ ii^'^ lip ^27'i^7'^i^ II lip 11*^ II g ' I'^^i'^ 



dx 



(1) 



At a critical point, ^ = for all i. Thus for all i, 



— ||^^||P ■ ^ "ijl^i^r (2) 

Define an operator S : , with the ith co-ordinate of Sx being (note that all terms 

involved are positive) 

(5x). = (5^a.,(^,x)^-i)^/(^-^) 
j 

Thus, at a critical point, Sx oc x. Now consider the the following algorithm: 
(Input. An nxn matrix A with all entries in [jj, 1], error parameter 6.) 

1: Initialize x = TTTii-- 

2: loop {T times (it wiU turn out T = {N7i) ■ polylog(A^, n, 1/6))} 
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3: set X Sx. 

4: normalize x to make \\x\\q = 1. 
5: end loop 

A fixed point of the iteration is a vector x such that Sx oc x. Thus every critical point of / is a 
fixed point. It turns out that every positive fixed point is also a critical point. Further, there will 
be a unique positive fixed point, which is also the unique maximum of /. 



3.2 Analyzing the Algorithm 

We will treat /(x) as defined over the domain . Since the matrix A is positive, the maximum 
must be attained in M" . Since / is invariant under scaling x, we restrict our attention to points in 

= {x : x G M" , \\x\\q = 1}. Thus the algorithm starts with a point in S^, and in each iteration 
moves to another point, until it converges. 

First, we prove that the maximum of / over occurs at an interior point (i.e., none of the 
co-ordinates are zero). Let x* denote a point at which maximum is attained, i.e., f{x*) = \\A\\q^p 
(x* need not be unique). Since it is an interior point, V/ = at x*, and so x* is a fixed point for 
the iteration. 

Lemma 3.1. Let x* € be a point at which f attains maximum. Then each co-ordinate of x* is 
at least Tw-yr. 

(Nn)'' 

The proof of this can be found in Section [H Next, we show that with each iteration, the value 
of the function cannot decrease. This was proved by |Boy74| (we refer to their paper for the proof). 

Lemma 3.2. (\Boy74^ ) For any vector x, we have 

\\ASx\\p ^ ||j4a;||p 



IIS'xl^^ 

The analysis of the algorithm proceeds by maintaining two potentials, defined by 

. {Sx)i {Sx)i 
m[x) = mm and M [x) = max . 

i X i Xi 

If X is a fixed point, then m{x) = M(x). Also, from Section [3.11 each is equal to ( haIi'i'' )^^'''^ 
As observed in |Boy74| , these quantites can be used to 'sandwich' the norm - in particular. 

Lemma 3.3. For any positive vector x with \\x\\q = 1, we have 

m{xy~^ < \\A\\Pq^p < M{xy-^ 

The lemma is crucial - it relates the norm (which we wish to compute) to certain quantities we 
can compute starting with any positive vector x. We now give a proof of this lemma. Our proof, 
however, has the additional advantage that it immediately implies the following: 

Lemma 3.4. The maximum of f on Sq is attained at a unique point x* . Further, this x* is the 
unique critical point of f on Sq (which also means it is the unique fixed point for the iteration). 

Proof (of Lemma \3. Let x € be a positive vector. Let x* E 5^ be a vector which maximizes 
fix). 
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The first inequality is a simple averaging argument: 



(3) 
(4) 



The last inequality uses = 1. Thus there exists an index i such that {Sx)f~^ /x^^^ < \\Ax\\ 

The latter inequality is more tricky - it gives an upper bound on f{x*), no matter which x € Sg 
we start with. To prove this, we start by observing that x* is a fixed point, and thus for all k, 



m{x*f-' = M{x 



Call this quantity A. Now, let > be the smallest real number such that x — Ox* has a zero 
co-ordinate, i.e., Xk = Ox^., and xj > Ox* for j ^ k. Since \\x\\q = \\x*\\q and x ^ x*, is well- 
defined, and Xj > Ox* (strictly) for some index j. Because of these, and since each Uij is strictly 
positive, we have Sx > S{Ox*) = O^P-'^^^i-'^^Six*) (clear from the definition of 5). 

Now, for the index k, we have 



{Sx)l' OP-\Sx*)l' _ 
rr' {Ox*M~^ 



^) 

Thus we have M{xY^^ > A (since q > p, and < ^ < 1), which is what we wanted to prove. □ 

Let us see how this implies Lemma l3.4i 

Proof (of Lemma \'j.4\ )- Let x* G denote a vector which maximizes / over (thus x* is one 
fixed point of S). Suppose, if possible, that y is another fixed point. By the calculation in Eq.Q 
(and since y is a fixed point and x* maximizes /), we have 



|y||<? 



Now since y ^ x*, the argument above (of considering the smallest such that y — Ox* has a zero 
co-ordinate, and so on) will imply that M{yY~^ > A = f{x*y, which is a contradiction. 

This proves that there is no other fixed point. □ 

The next few lemmas say that as the algorithm proceeds, the value of m(x) increases, while 
M(x) decreases. Further, it turns out we can quantify how much they change: if we start with an 
X such that M{x)/m[x) is 'large', the ratio drops significantly in one iteration. 

Lemma 3.5. Let x he a positive vector. Then m{x) < m{Sx), and M(x) > M{Sx). 

Proof. Suppose m(x) = A. So for every i, we have {Sx)i > Axj. Now fix some index i and consider 
the quantity 
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Since A is a, positive matrix and {Sx)i > Xxi, we must have {AjSx) > A • (Ajx) for every j. Thus 

j 

This shows that m[Sx) > A. A similar argument shows that M{Sx) < M{x). □ 

Lemma 3.6. Let x be a positive vector with ||x||q = 1, and suppose Af(x) > (1 + a)m(x). Then 
m(5x)>(l + ^)m(x). 

Proof. Let m(x) = A, and suppose k is an index such that (-S'x)^ > (1 + a)X ■ x^ (such an index 
exists because M(x) > (1 + a) A. In particular, (iSx) > Ax + a\ ■ e^, where is the standard basis 
vector with the feth entry non-zero. Thus we can say that for every j, 

^j(5x) > XAjyi + aXAjGk- 

The second term will allow us to quantify the improvement in 77i(x). Note that AjGk = ijk > 
■j^Ajl (since Ajk is not too small). Now 1 > x since x has g-norm 1, and thus we have 

A,(5x)> + — )A.A,x 

Thus (55x)r^ > (1 + ^y-^X'^-HSx)l\ implying that m(5x) > (l + ^)A. □ 

This immediately implies that the value ||j4||qH^p can be computed quickly. In particular, 

Theorem 3.7. For any 6 > 0, after 0{Nn-]iolylog{N , n, ^)) iterations, the algorithm of Section \3.1\ 
finds a vector x such that f{x) > (1 — 5)f{x*) 

Proof. To start with, the ratio is at most Nn (since we start with 1, and the entries of the 
matrix lie in [1/N, 1]). Lemma 13.61 now implies that the ratio drops from (1 + q) to (1 + ^) in 
Nn iterations. Thus in T = (A^n)polylog(A^, n, 1/5) steps, the x we end up with has at 

most (l + (^pfj^^c ) for any constant c. This then implies that f{x) > f{x*)(l — jj;^^), after T 
iterations. □ 



4 Proximity to the optimum 

The argument above showed that the algorithm finds a point x such that f{x) is close to f{x*). 
We proved that for positive matrices, x* is unique, and thus it is natural to ask if the vector we 
obtain is 'close' to x* . This in fact turns out to be important in an application to oblivious routing 
which we consider in Section [5l 

We can prove that the x we obtain after T = (A^n)polylog(A^, n, 1/5) iterations is 'close' to x* . 
The rough outline of the proof is the following: we first show that /(x) is strictly concave 'around' 
the optimum H. Then we show that the 'level sets' of / are 'connected' (precise definitions follow). 
Then we use these to prove that if f[x) is close to f{x*), then x — x* is 'small' (the choice of norm 
does not matter much). 

Some of these results are of independent interest, and shed light into why the q ^ p problem 
may be easier to solve when p < q (even for non- negative matrices). 

^Note that the function / is not concave everywhere (see Appendix lA.l|) 
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Concavity around the optimum. We now show that the neighborhood of every critical point 
(where V/ vanishes) is strictly concave. This is another way of proving that every critical point is 
a maximum (this was the way |ER09j prove this fact in the p = q case) . 
Taking partial derivatives of f{x) = nmr^, we observe that 



9f _ f..( EkiAkxr -'^M xi'- 

dx, ^^"^^ \\Ax\\l 

where refers to the fc*'' row of matrix A. Now, consider a critical point z, with \\z\\q = 1 

(w.l.o.g.). We can also always assume that w.l.o.g. the matrix A is such that = 1. Thus at 
a critical point z, as in Eq.dJ]), we have that for all i: 

^{Akzy-'aki = zl' (7) 

k 

Computing the second derivative of / at z, and simplifying using ||^2:||p = ||z||q = 1, we obtain 
1 d^f 



p dxidxj 
1 d^f 



p dxf 



= {p-l)Y^{Akzr-^ak^ak, + iq-p)zl'z]-' (8) 
k 

= ip-l)J2{A,zr-'al + {q-p)z^'^-'-{q-l)zr' (9) 
k 

We will now show that the Hessian Hf is negative semi-definite, which proves that / is strictly 
concave at the critical point z. Let e be any vector in M". Then we have (the {q — 1 )4" in © is 
split as {p — l)z1 ^ + (g — p)z1 ^, and Yli j includes the case i = j) 

i,j k i 

+ Piq - P) ( J]](^i2i)'^~^eiej - ^r'^^'i) 
= Ti+T2 (say) 

We consider Ti and T2 individually and prove that they are negative. First consider T2. Since 
z1 = 1, we can consider to be a probability distribution on integers 1, . . . , n. Cauchy-Schwartz 
now implies that Ej[(ej/zj)^] > (Ej[(ej/2;j)]) . This is equivalent to 

Noting that q > p, we can conclude that T2 < 0. Now consider Ti. Since z is a fixed point, it 
satisfies Eq. ([TD, thus we can substitute for x'^"^ in the second term of Ti. Expanding out (Ai^z) 
once and simplifying, we get 



i,j k 



^i^j ' E^ 



k i,j 

< 

This proves that / is concave around any critical point z. 



Ei _ N 2 

Zi A 
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Level sets of /. Let S^, as earlier, denote the (closed, compact) set {x € : ||x|[q = 1}. Let 
Mr denote {x € 5^ : f{x) > r}, i.e., Mr is an 'upper level set', (it is easy to see that since / is 
continuous and A is positive, Mr is closed). 

Let (S C 5g . We say that two points x and y are connected in S, if there exists a path (a 
continuous curve) connecting x and y, entirely contained in S (and this is clearly an equivalence 
relation). We say that a set S is connected if every x,y G S are connected in S. Thus any subset 
of Sg can be divided into connected components. With this notation, we show ( |ER09j proves the 
result when p = q). 

Lemma 4.1. The set Mr is connected for every r > 0. 

This follows easily from techniques we developed so far. 

Proof. Suppose if possible, that Mr has two disconnected components Si and 82- Since there is 
a unique global optimum x* , we may suppose Si does not contain x*. Let y be the point in Si 
which attains maximum (of /) over Si [y is well defined since M is closed). Now if V/|y = 0, we 
get a contradiction since / has a unique critical point, namely x* (Lemma 13. 4p . If V/|.y 7^ 0, it 
has to be normal to the surface (else it cannot be that y attains maximum in the connected 
component ^i). Let z be the direction of the (outward) normal to 5^ at the point y. Clearly, 
(z,y) > (intuitively this is clear; it is also easy to check). 

We argued that V f\y must be parallel to z, and thus it has a non-zero component along y - 
in particular if we scale y (equivalent to moving along y), the value of / changes, which is clearly 
false! Thus Mr has only one connected component. □ 

Since we need it for what follows, let us now prove Lemma l3. II 

Proof of Lemma [Ol Let x* be the optimum vector, and suppose \\x*\\q = 1. Consider the quantity 



v/i ' 



First, note that x* ^ for any i. Suppose there is such an i. If we set Xi = 5, each term in the 
numerator above increases by at least ^ (because AiX* is at least jj^, and {jf + j^y > '^i^+P^))-, 
while the denominator increases from 1 to (1 + 6'^)P/i K 1 + {p/q)di for smaU enough 6. Thus since 
q > 1, we can set 6 small enough and increase the objective. This implies that x* is a positive 
vector. 

Note that Ajx* > jj' ■ x* > (because the |lx*||i > ||x*||g = 1). Thus for every i, 

(5x*)r = j;a.,(A,xT"^>^. 

3 

Further, \\A\\p < n^"'"^, because each ajj < 1 and so AjX < nxmax (where Xmax denotes the largest 
co-ordinate of x). Now since Eqn.([2]) holds for x* , we have 

nP+i > WAE = > 



^ (x*)r^ ~ iVP(x*)f"^' 
This implies that x* > -^^y^, proving the lemma (we needed to use q > p > 1 to simplify). □ 



11 



We now show that if j; € 5^ is 'far' from x*, then /(x) is bounded away from f{x*). This, 
along with the fact that Mr is connected for all r, implies that if /(x) is very close to f{x*), then 
\\x — x*\\i must be small. For ease of calculation, we give the formal proof only for p = q (this is 
also the case which is used in the oblivious routing application). It should be clear that as long 
as we have that the Hessian at x* is negative semidefinite, and third derivatives are bounded, the 
proof goes through. 

Lemma 4.2 (Stability). Suppose x S Sg, with \\x — x*\\i = 6 < jj^^^. Then 

Proof. Let e denote the 'error vector' e = x — x* . We will use the Taylor expansion of / around 
X*. Hf denotes the Hessian of / and gf is a term involving the third derivatives, which we will get 
to later. Thus we have: (note that V/ and Hf are evaluated at x*) 

fix) = fix*) + e ■ V/|,. + ^ e'^Hf\,,e + gfie') (12) 
At X*, the V/ term is 0. From the proof above that the Hessian is negative semidefinite, we have 



s^HfE = -Pip - 1) Y.^AsX*)^~HY. ^sras,X*X*0 - ^)^ (13) 



X x- 



We want to say that if ||e||i is large enough, this quantity is sufficiently negative. We should 
crucially use the fact that |lx*||p = ||x* + = 1 (since x is a unit vector in p-norm). This is the 
same as 

\Xi + Si\ — / ^ |Xj I . 

i i 

Thus not all Si are of the same sign. Now since ||e||i > 5, at least one of the £i must have absolute 
value at least and some other Ej must have the opposite sign, by the above observation. Now 
consider the terms corresponding to these i,j in Eqn. ()13p . This gives 



e^HfS < -pip - 1) J2(^sxT-' ■ as^as, • § • ^ (14) 

s * 
< -Pip - 1) E(^^^*)""'%^ -71^2 (15) 



s 
X2 



<-P(P-^)-J^,-U^*rr, (16) 

Note that we used the facts that entries aij lie in [;^,1] and that x* G [^^^^,1]. Thus it only 
remains to bound the third order terms igj, in Eqn. (jl2p ). This contribution equals 

9fi'^ = ^X''9^^^X'''^d^,^ ^ '''^''dx.dx.dx, 



It can be shown by expanding out, and using the facts that rusi < NiMsX*) and ^ < (TVn)^ that 
for i,j,k, 

< 10p^iNn)^\\Ax*\\P 



dxidxjdxk 
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Thus, the higher order terms can be bounded by 

gf{e) < lOp^ -n^N^- \\Ax*\\P-6^ 
So, if 5 < ■ jj^^jjn, the Hessian term dominates. Thus we have, as desired: 

□ 

This proves that the vector we obtain at the end of the T iterations (for T as specified) has an 
£i distance at most ^^y^ to x*. Thus we have a polynomial time algorithm to compute x* to any 
accuracy. 

5 An Application - O(logn) Oblivious routing scheme for ip 

We believe that our algorithm for computing the ||yl|[qH-j.p (for non- negative matrices) could find 
good use as an optimization tool. For instance, eigenvalue computation is used extensively, not just 
for partitioning and clustering problems, but also as a subroutine for solving semi-definite programs 
|GLS88] . We now give one application of our algorithm and the techniques we developed in section 
m to the case of oblivious routing in the £p norm. 

Oblivious routing. As outlined in the Introduction, the aim in oblivious routing is, given a 
graph G = {V,E), to specify how to route a unit fiow between every pair of vertices in V. Now, 
given a demand vector (demands between pairs of vertices) , these unit fiows are scaled linearly by 
the demands, and routed (let us call this the oblivious flow). This oblivious flow is compared to 
the best flow in hindsight i.e. knowing the demand vector, with respect to some objective (say 
congestion), and we need to come up with a scheme which bounds this competitive ratio in the 
worst case. 

Gupta et al. |GHR06j consider the oblivious routing problem where the cost of a solution is 
the ip norm of the 'flow vector' (the vector consisting of total flow on each edge). I n the case 
p = oo, this is the problem of minimizing congestion, for which the celebrated result of |R08) gave 
an O(logn) competitive scheme. For the ii version of the problem, the optimal solution (as is 
easily seen) is to route along shortest paths for each demand pair. The ip version tries to trade-off 
between these two extremes. 

By a clever use of zero sum games, |ER09j reduced the problem of showing existence good 
oblivious routing schemes for any p to the ioo case. This showed (by a non-constructive argument) 
the existence of an O(logn) oblivious routing scheme for any p > 1. They then make their result 
constructive for p = 2 (the proof relies heavily on eigenvectors being orthogonal). Using our 
algorithm for finding the ^p-norm of a matrix and the stability of our maxima (Lemma 14. 2p . we 
make the result constructive for all ip. 

Zero-sum game framework of [ER09j : We first give a brief overview of the non-constructive 
proof from |ER09] . The worst-case demands for any tree-based oblivious routing scheme can be 
shown to be those with non-zero demands only on the edges of the graph. The competitive ratio of 
any tree-based oblivious routing scheme can then be reduced to a matrix p-norm computation: if 
M is a I I X I ii^ I -dimensional matrix which represents a tree-based oblivious routing scheme which 
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specifies unit flows for each demand across an edge of the graph, the competitive ratio is given by 
max||u|| <i||Mu||p where u G M}^^. 

To show the existence of an obhvious routing scheme with competitive ratio O(logn), [ER09j 
define a continuous two player zero-sum game. The first player (row player) chooses from the set 
of all tree-based oblivious routing matrices (of dimension \E\ x \E\). The second player's (column 
player) strategy set is the set of vectors u € M'^' with positive entries, and \\u\\p = 1, and the value 
of the game is ||Mu|L. With a clever use of min-max duality in zero sum games and the oblivious 



routing scheme of jR()8] for congestion (^oo-norm) as a blackbox, |ER09j show the non-constructive 
existence of an oblivious routing scheme M which gets a value of O(logn) for all demand vectors. 

Finding such a (tree-based) oblivious routing scheme requires us to solve this zero-sum game ef- 
ficiently. The constructive algorithm from |ER09] for £2 , however crucially uses the ortho- normality 
of the eigenspace for ||M|[2 computation, to solve the aforementioned zero-sum game. First we state 
without proof a couple of lemmas from |ER09j . which will also feature in our algorithm. 

Lemma 5.1. Let OBL be a tree-based oblivious routing scheme given by a \E\ x (2) dimensional 
matrix (non-negative entries) and let its restriction to edges be OBL' € RI^I^I^L The competitive 
ratio of the oblivious algorithm is at most \\OBL'\\p. 

Henceforth, we shall abuse notation and use OBL to refer to both the tree-based Oblivious 
routing matrix and its restriction to edges interchangeably. Further, 

Lemma 5.2. For any given vector u € M'^', there exists an tree-based Oblivious routing scheme 
( denoted by matrix OBL ) such that 

\\OBL ■ u||p < 0(logn)||u||p 

This lemma shows that for every vector u, there exists some routing scheme (which could 
depend on the vector) that is O(logn) competitive. We will now show how to compute one tree- 
based routing matrix OBL that works for all vectors i.e. ||Oi?L||p < 1. From Lemma 15. 2| we 
know that for every unit vector u, there exists an tree-based oblivious routing matrix such that 
\\M ■ u\\p < O(logn). We use this to construct one tree-based oblivious routing matrix OBL that 
works for every load vector u. Note that the set of tree-based oblivious routing schemes is convex. 
Before, we show how to construct the oblivious routing scheme, we present a simple lemma which 
captures the continuity of the p-norm function. 

Lemma 5.3. Let f = ^j^j^jp^; where A is an n x n matrix with minimum entry and let y be an 
n-dimensional vector with minimum entry jj^p;^- Let x be a vector in the 5 -neighborhood of y i.e. 



y\\i = 6 < . Then, 



fix) < f{y) + 1 (18) 



Proof. The proof follows just from the continuity and differentiability of the p-norm function at 
every point. Using the Taylor's expansion of /, we see that 



f{x) = f{y) + e-Vfy + - e'^Hf^ys' (19) 

where < e' < e. Choosing e = 5 = ^^y^^^^ and using the lower bounds the matrix entries and the 
co-ordinates of y as in Lemma 14.21 we see that the lemma follows. □ 

We now sketch how to find a tree-based oblivious routing matrix when the aggregation function 
is an ip norm. 
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Theorem 5.4. There exists a polynomial time algorithm that computes an oblivious routing scheme 
with competitive ratio O(logn) when the aggregation function is the ip norm with p > 1 and the 
load function on the edges is a norm. 

Proof sketch. The algorithm and proof fohow roughly along the lines of the constructive version 
for p = 2 m |ER09j . As mentioned earlier, their proof uses inner products among the vectors (and 
the computation of eigenvalues). However, we show that the procedure still works because of the 
stability of our solution (Lemma 14.21 

Let Je be an \E\ x \E\ matrix will all entries being e. Let f{M) = \\M + J i We want a tree- 

based oblivious routing matrix OBL such that f{OBL) < clogn for some large enough constant c. 
We follow an iterative procedure to obtain this matrix OBL starting with an arbitrary tree-based 
routing matrix Mq. At stage i, we check if for the current matrix Mj , ||Mj||p < clogn. If not, using 
the iterative algorithm in Section [3l we obtain unit vector x^.^ which maximizes ||-/Vf(j-)x||p. Let M^j) 

be the tree-based oblivious routing matrix from Lemma 15.21 such that HM^j^x^.^ ||p < clogn/2 — 2. 
We now update 

Mi+i = (1 - X)Mi + XMi 

Observe that this is also a tree-based oblivious routing matrix. We now show that ||Mj_|_i ||p decreases 
by an amount ^i-^^^)- 

At step i, roughly speaking, for all vectors y that are far enough from x^y \\Miy\\p < \\Miy\\p — 
p^iy^^n) f^om Lemma 14.21 (stability). Choosing A = 0(n~^) for some large enough constant c > 0, 
it easily follows that ||Mj+iy||p < ||Mjy||p — p^j^^^-^ . On the other hand, consider y in the 6- 
neighborhood of x*^-y Using Lemma EH 

,, ~ ,, clogn 
llM^yllp < 



Hence, 



\M,+iy\\p = (l-A)||Miy||p + A|logn 

< ||Mjy||p — A X - log n (since ||Miy||p > clog n 

< WMivWp ^ — 



poly(n) 

Hence, it follows that the matrices Mj decrease in their p-norm by a small quantity ^( poiy(n) ) 
every step. It follows that this iterative algorithm finds the required tree-based oblivious routing 
scheme in poly(n) steps. □ 

6 Inapproximability results 

We will now prove that it is NP-hard to approximate || A||gi_^p-norm of a matrix to any fixed constant, 
for any q > p > 2. We then show how this proof carries over to the hardness of computing the 
oo p norm. 

6.1 Inapproximability of ||74||p^p 

Let us start with the question of approximating ||A||pH-i.p. We first show the following: 
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Proposition 6.1. For p > 2 it is NP-hard to approximate that \\A\\p to some (small) constant 
factor rj > 1. 

Proof: We give a reduction from the gap version of the MaxCut problem. The following is 
well-known (c.f. |Has01| ) 

There exist constants 1/2 < p < p' < 1 such that given a regular graph G = {V,E) on 
n vertices and degree d, it is hard to distinguish between: 
Yes case: G has a cut containing at least p'{nd/2) edges, and 
No case: No cut in G cuts more that p{nd/2) edges. 

Suppose we are given a graph G = {V, E) which is regular and has degree d. The p-norm 
instance we consider will be that of maximizing ^(xo, . . . , a;„) (xj € M"), defined by 



g{xQ,xi, ...,Xn) 



Yji^j - + Cd- [Y.i + Xi\P + \xq - Xi\P) 



Here C will be chosen appropriately later. Note that if we divide by d, we can see g'(x) as the ratio 

^(x) _ Y^i^j - + C{\xq + Xi\P + \xq -Xi\P + \xq + Xj\P + \xq - Xj\P)) 



d Ei~i2|xo|P + |xi|P + |2;j|P 

The idea is to do the analysis on an edge-by-edge basis. Consider the function 

\x - y\P + C(\l + x\P + \1 - x\P + \1 + y\P + \1 - y\P) 



(20) 



2 + \x\P + \y\- 



Definition. A tuple (x, y) is good if both |x| and \y\ lie in the interval (1 — e, 1 -|- e), and xy < 0. 
A technical lemma concerning / is the following 

Lemma 6.2. For any e > 0, there is a large enough constant G such that 

(C ■ 2^"-*- + „ fHf M^ i p , if (x,y) is good , , 

/(x,y)< ^^'^^ (21) 

\G-lP otherwise 

We now present the proof of Lemma 16.21 We first start with a simpler inequality - note that 
this is where the condition p > 2 comes in. 

Lemma 6.3. For all x G M, we have 

\l + x\P + \l-x\P ^ 
l + \x\P 

Further, for any e > 0, there exists a 5 > such that if \x\ ^ [1 — e,l + e], then 

\1 + x\P + \1 - x\P 



1 + \x\P 



< 2P-^ - 5. 



Proof. We may assume x > 0. First consider x > 1. Write x = 1 + 26, and thus the first inequality 
simplifies to 

[(1 + 2e)p - (1 + e)p] >{i + e)p-i + 2ep. 
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Now consider 

ri+e 

1= / {{x + ef-^ -xP-^)dx. 

Jx=l 

For each x, the function being integrated is > 9^"^, since p > 2 and x > 0. Thus the integral is at 
least 9^. Now evaluating the integral independently and simplifying, we get 

{I + 29)P - 2{1 + 9)P + 1 > p ■ 9P, 

which gives the inequality since p > 2. Further there is a slack of {p — 2)9P. Now suppose < x < 1. 
Writing x = 1 — 29 and simplifying similarly, the inequality follows. Further, since we always have 
a slack, the second inequality is also easy to see. □ 

Proof (of Lemma \6.2\) . The proof is a straight-forward case analysis. Call x (resp. y) 'bad' if 

|x| [1 — e, 1 + e]. Also, b{x) denotes a predicate which is 1 if x is bad and otherwise. 

Case 1. {x,y) is good. The upper bound in this case is clear (using Lemma lOl) . 

Case 2. Neither of x^y are bad, but xy > 0. Using Lemma 16.31 we have f{x,y) < C ■ 2^"^ + e, 

which is what we want. 

Case 3. At least one of x,y are bad (i.e., one of b{x),b{y) is 1). In this case Lemma [6T3l gives 
\x-y\P + C{{l + \x\P ){2P-^ - Sb{x)) + (1 + |y|P)(2P-i - 6b{y))) 

J [^1 y) — 



C ■ 2P-^ + 



2 + \x\P + \y\P 
\x - y\P - C{6b{x){l + \x\P) + 6b{y){l + \y\P)) 
2 + \x\P + \y\P 



Since |x — i/l*' < 2^ ^(|2;|^ + and one of b{x), b{y) > 0, we can choose C large enough (depending 
on (5), so that /(x,2/) < C • 2P-1. □ 

Soundness. Assuming the lemma, let us see why the analysis of the No case follows. Suppose 
the graph has a Max-Cut value at most p, i.e., every cut has at most p ■ nd/2 edges. Now consider 
the vector x which maximizes ^(xo, xi, . . . , x^). It is easy to see that we may assume xq 7^ 0, thus 
we can scale the vector s.t. xq = 1. Let S C.V denote the set of 'good' vertices (i.e., vertices for 
which |xj| G (1 — e, 1 -I- e)). 

Lemma 6.4. The number of good edges is at most p ■ (1'^!+")'^ . 

Proof. Recall that good edges have both end-points in S, and further the corresponding x values 
have opposite signs. Thus the lemma essentially says that there is no cut in S with p ■ (1*^1+")'-' 
edges. 

Suppose there is such a cut. By greedily placing the vertices of y \ S on one of the sides of this 
cut, we can extend it to a cut of the entire graph with at least 

i\S\+n)d {n-\S\)d _ pnd {1 - p){n - \S\) pnd 
^ 4 ^ 4 ~ ~^ 4 ^ ~2r 

edges, which is a contradiction. This gives the bound. □ 
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Let denote the numerator of Eq. (j20p . We have 

N = Y,f{xux,){2 + \x,\P + \x,\P) 

i i^j, good 

< Cd ■ 2P~' • (n + ^ \x,\P) + ^'^^^^''^'^ • 2^(1 + e). 

i 

Now observe that the denominator is n + ^ - \xi\^ > n + I^Kl — e)^, from the definition of S. Thus 
we obtain an upper bound on g{x) 

g{x) < Cd ■ 2^-1 + ^ • 2P{1 + e)(l - e)-P. 

Hardness factor. In the Yes case, there is clearly an assignment of ±1 to Xi such that g{x) is 
at least Cd ■ 2^^^ + ^ • 2^. Thus if e is smah enough (this wih make us pick C which is large), the 

gap between the optimum values in the Yes and No cases can be made (l + — where the ^2(1) 
term is determined by the difference p' — p. This proves that the p-norm is hard to approximate to 
some fixed constant factor. 

Note. In the analysis, e was chosen to be a small constant depending on p and the gap between p 
and p'; C is a constant chosen large enough, depending on e. 



The Instance. We have argued about the hardness of computing the function g{xQ,xi, . . . ,Xn) 
to some constant factor. This can be formulated as an instance of p-norm in a natural way. We 
describe this formally (though this is clear, the formal description will be useful when arguing 
about certain properties of the tensored instance which we need for proving hardness of ||A||gH^>p 
for p < q). 

First we do a simple change of variable and let z = n^^^xo- Now, we construct the 5\E\ x (n + 1) 
matrix M. For each edge e = in E{G), we have five rows in M. Let the column indices run 

from < i < n. 

flif£ = i and -lif£ = j 
1 otherwise 




if £ = and - 1 if . 
otherwise 

n-Vp if£ = and 1 if. 
otherwise 



We have two similar rows for Me^/ and Mgg/ where we have corresponding values with j instead 
of i. It is easy to see that this matrix actually takes the same value ||M||p as g. Further in the 
Yes case, there is a vector x = {n^^P,xi,X2, ■ ■ ■ ,Xn) with Xi = ±1, that attains the high value 
{C.d.2P-^ + p'd.2P-^). 
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6.2 Amplifying the gap by tensoring 



We observe that the matrix p i— ?> p-norm is multiplicative under tensoring. More precisely, 

Lemma 6.5. Let M , N he square matrices with dimensions m x m and nx n respectively, and let 
p>l. Then \\M (g) iV||p = ||M||p • ||iV||p. 

The tensor product M N is defined in the standard way - we think of it as an m x m matrix 
of blocks, with the i, jth block being a copy of N scaled by niij. It is well-known that eigenvalues 
(p = 2) mutliply under tensoring. We note that it is crucial that we consider ||^||p. Matrix norms 
||yl||gi_5.p for p ^ q do not in general multiply upon tensoring. 

Proof. Let A(^) denote the p-norm of a matrix A. Let us first show the easy direction, that 
A(M (8) N) > A(M) • X{N). Suppose x,y are the vectors which 'realize' the p-norm for M,N 
respectively. Then 



Also ||x IE) y\\p = \\x\\p ■ \\y\\p, thus the inequality follows. 

Let us now show the other direction, i.e., A(M(g) A^) < A(Af) • X{N). Let x, z be mn dimensional 
vectors such z = (A B)x. We will think of x, z as being divided into m blocks of size n each. 



(and similarly define Z). At the expense of abusing notation, let Aj refer to the j row of matrix 
A. Also, let the element-wise p-norm of matrix M be defined as 



It is easy to observe that = AX^BT. Further, \\z\\p = \Z\Qp. 

We now expand out Z and rearrange the terms to separate out the operations of B and A, in 
order to bound \Z\qp using the p-norms of A and B. Hence, we have 



||(M ® N){x ® y)\\P = ^ |(M, • x){N, • y)\P 




X{M)P • X{N)P 






IP 
\p 




m n 



{{^iXi:AiX2, . . . , AiXn)B'^'' 



i=l j=l 



But from definition, \\Mx 



EkiBkxY < X{M)\\x 



IP 
lp- 
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Hence, by applying the operator p-norm bound of B, 

i 

m n 

i=l k=l k i 

<x{BrxiAr^\\x,rp 

k 

= x{Arx{Br\x\i^ 

Since l^lop = ||x||p = 1, we have < X{A)X{B). 

□ 

Note. We note that the tensoring result is stated for square matrices, while the instances above 
are rectangular. This is not a problem, because we can pad Os to the matrix to make it square 
without changing the value of the norm. 

The tensored matrix and amplification Consider any constant 7 > 0. We consider the 
instance of the matrix M obtained in the proof of Proposition I6.H and repeatedly tensor it with M 
k = log^ 7 times to obtain M' = M^^ . From Lemma 16. 5^ there exist tq and ts where tc/ts > 7 
such that in the No case, for every vector y G R("+^)'^, < T5. 

Further, in the Yes case, there is a vector y' = {n^^P,xi,X2, ■ ■ ■ ,Xn)'^'' where Xi = ±1 (for 
i = 1, 2, ... n) such that |[M'y'||p > tq- 

Note: Our techniques work even when we take the tensor product log"^ n for some constant c. 
Thus we can conclude: 

Theorem 6.6. For any 7 > and p > 2, it is NP-hard to approximate the p-norm of a matrix 
within a factor 7. Also, it is hard to approximate the matrix p-norm to a factor 0/ J^(2(i°s")' ') 
for any constant e > 0, unless NP C DTIME(2?'°'2''''^^("))). 



Properties of the tensored instance: We now establish some structure about the tensored 
instance, which we will use crucially for the hardness q p norm. Let the entries in vector y' 
be indexed by fc-tuple I = [11,12, ... ,ik) where Zfc G {0, 1, . . . , n). It is easy to see that 

y'j = ±n^^^^^^ where w{I) number of Os in tuple 

Let us introduce variables where w{I) = number of Os in tuple I. It is easy to 

observe that there is a matrix B such that 

\\y\\p 

Further, it can also be seen that in the Yes case, there is a ±1 assignment for xj which attains the 
value g'{x) = tq- 



6.3 Approximating ||A||g^p when p ^ q. 

Let us now consider the question of approximating The idea is to use the hardness of 

approximating ||j4||pH->-p- We observed in the previous section that the technique of amplifying 
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hardness for computing the q i— )• p-norm by tensoring a (smah) constant factor hardness does not 
work when q ^ p. However, we show that we can obtain such amphfied label-cover like hardness 
if the instance has some additional structure. In particular, we show the instances that we obtain 
from the tensoring the hard instances of \\A\\p can be transformed to give such hard instances for 



We illustrate the main idea by first showing a (small) constant factor hardness: let us start 
with the following maximization problem (which is very similar to Eqn. ()20p ) 

g{xo,xi,...,Xn) = -rj- . (22) 

{n\xo\'^ + E^\^^\''r' 

Notice that xq is now 'scaled differently' than in Eq. (|20p . This is crucial. Now, in the Yes case, 
we have 

^ ^ (p'{nd/2) - IP + Cnd-2Pf^^ 

max fii(x) > ; — —, . 

X ^ - (2n)i/9 

Indeed, there exists a ±1 solution which has value at least the RHS. Let us write M for the 
numerator of Eq. (j22p . Then 



<7(x) 



{n\xo\P + Ei l^il^f" H^ol'^ + Ei l^il^y^' 



Suppose we started with a No instance. The proof of the q = p case implies that the first term in 
this product is at most (to a (1 + e) factor) 



(p(nd/2) • 2P + Cnd ■ 2P 



,1/p 



(2n)Vp 

Now, we note that the second term is at most (2n)^/P/(2n)^/^. This follows because for any 
vector y € M", we have ||y||p/||y||g < n(Vp)-(i/9). We can use this with the 2n-dimensional vector 
{xq, . . . , Xq, xi, X2, . . . , Xn) to scc the desircd claim. 

Prom this it follows that in the No case, the optimum is at most (upto an (1 + e) factor) 



{pind/2) ■ 2P + Cnd ■ 2P 



(2^)1/9 

This proves that there exists an a > 1 s.t. it is NP-hard to approximate ||^||qH-i.p to a factor better 
than a. 

A key property we used in the above argument is that in the Yes case, there exists ail solution 
for the Xi (i > 0) which has a large value. It turns out that this is the only property we need. More 
precisely, suppose ^ is an n x n matrix, let Qj be positive integers (we will actually use the fact that 
they are integers, though it is not critical). Now consider the optimization problem maxygjgn g{y), 
with 

In the previous section, we established the following claim from the proof of Theorem 16.61 

Claim 6.7. For any constant 7 > 1, there exist thresholds tq and ts with tc/ts > 7, such that it 
is NP-hard to distinguish between: 
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Yes case. There exists a ±1 assignment to yi in ()23p with value at least tc, and 
No case. For all y G M", ^(y) < r^. 

Proof. Follows from the structure of Tensor product instance. 

We can now show that Claim IHTfl implies the desired result. 
Theorem 6.8. It is NP-hard to approximate \\A\\q^p to any fixed constant 7 for q > p > 2. 
Proof. As in previous proof (Eq. ()22p ). consider the optimization problem maxygijn /i(y), with 

By definition, 

hiy)=9iy)- ^^'"fl\'!' (25) 

Completeness. Consider the value of h{y) for A, Ui in the Yes case for Claim [6771 Let y be a ±1 
solution with g{y) > tq- Because the yi are ±1, it follows that 



h{y)>rc-{Y.^if''^~''"'\ 

i 



Soundness. Now suppose we start with an ^, ctj in the No case for Claim [6771 

First, note that the second term in Eq. (l25p is at most ( X^j Oj) ^^^'^\ To see this, we note 
that oii are positive integers. Thus by considering the vector {yi, . . . ,yi,y2, . . . ,y2, ■ ■ (where yi is 
duplicated times), and using ||u||p/||u||q < for u G M'^, we get the desired inequality. 

This gives that for all y G 



pn 



i i 

This proves that we cannot approximate h{y) to a factor better than tc/ts, which can be 
made an arbitrarily large constant by Claim [6771 This finishes the proof, because the optimization 
problem maXygjRn ^(y) can be formulated as a q 1— > p norm computation for an appropriate matrix 
as earlier. □ 

Note that this hardness instance is not obtained by tensoring the p norm hardness instance. 
It is instead obtained by considering the ||^||p hardness instance and transforming it suitably. 



6.4 Approximating \\A\ 



The problem of computing the 00 ^ p norm of a matrix A turns out to have a very natural and 
elegant statement in terms of column vectors of the matrix A. We first introduce the following 
problem: 

Definition 6.9 (Longest Vector Problem). Let vi,V2,...,Vn he vectors over JR. The Longest 
Vector problem asks for the 

max II > x,'Vi|L 
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Note that this problem differs from the weh-studied Shortest Vector Problem [Kho04j for lat- 
tices, which has received a lot of attention in the cryptography community over the last decade 
|Reg06| ■ The shortest vector problem asks for minimizing the same objective in Definition 16.91 
when Xi (z 7j. 

We now observe that computing the oo >—?■ p norm of the matrix is equivalent to finding the 
length of the longest vector, where the vectors Vi are the columns of A. 

Observation 6.10. Computing the ||^||ooi-j-p norm of a matrix is equivalent to computing the length 
of the Longest vector problem where the vectors are the column vectors of A. 

Proof. First note that ||Ax||p = HX^j XjajUp. The observation follows by noticing that this is 
maximized when \xi\ = 1 for all i. □ 

The oo p norm of the matrix also seems like a natural extension of the Grothendieck 
problem | AN041 IKNSOSj . When p = 1, we obtain the original Grothendieck problem, and the 
p = 2 case is the £2 Grothendieck problem and maximizes the quadratic form for p.s.d. matri- 
ces. Further, as mentioned earlier there is a constant factor approximation for 1 < p < 2 using 
|Nes98] . However, for the p > 2, we show that there is 0(2('°s") ') hardness for computing 00 i—)- p 
norm assuming NP does not have quasipolynomial time algorithms using the same techniques from 
Theorem 16.81 

Theorem 6.11. It is NP-hard to approximate \\A\\ao^p to any constant 7 for p > 2 and hard to ap- 
proximate within a factor o/il(2('°S'")^ for any constant e > 0, assuming 

The proof of Theorem 16.81 also works out for g = 00 by noting that the second expression in 
Eq. ()25p is instead maxx m^m — which is also maximized when = 1 for all i. 
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A Miscellany 

A.l Non-convex optimization 

Note that computing the p p norm is in general not a convex optimization problem, i.e., the 
function / defined by f{x) = "ii || is not in general concave. For example, consider 




In this case, with p = 2.5, for instance, it is easy to check that /((x -|- y)/2) < (/(x) + /(y))/2. 
Thus / is not concave. However, it could still be that / raised to a certain power is concave. 



A.2 Duality 

The following equality is useful in 'moving' from one range of parameters to another. We use the 
fact that = max^^ . yj^y y'^x, where is the 'dual norm', satisfying 1/p+l/p' = 1. (similarly 
q' denotes the dual norm of q) 

\\A\\q^p = max = max y'^Ax = max x'^ AJ'y = (26) 

1 1 ^ 1 1 — 1 1 1 1 1 g — 1 1 1 3? 1 1 qr — 1 

lly||p'=i lly|lp'=i 



A. 3 Moving to a positive matrix 

We now show that by adding a very small positive number to each entry of the matrix, the q^ p- 
norm does not change much. 

Lemma A.l. Let A he an n x n matrix where the maximum entry is scaled to 1. Let be the 
matrix with all entries being e. 

Proof. We first note that ||^||qH->.p > 1 (because the maximum entry is 1). It is also easy to see that 

Je is maximized by the vector with all equal entries. Hence ||Je||gH^.p 1^ n p le. Hence, by using 
the fact that IHIgn^p is a norm, the lemma follows. □ 
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